-
Notifications
You must be signed in to change notification settings - Fork 3
Update synthea and wdi metadata #451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6b1b7a8 to
e028c1d
Compare
john-sanchez31
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing this? For #457 I'm using synthea metadata from s3 so it will be no longer needed in the repo (actually this metadata will be no longer in the repo). Also I think we should wait for the new [run custom] flag which runs custom tests so this can be tested properly.
| "properties": [ | ||
| { | ||
| "name": "ITEM", | ||
| "name": "item", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why all properties are being change from uppercase to lowecase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
names generated by the metadata tool are lowercase. No manual changes were done to htese files.
| "name": "year_", | ||
| "type": "table column", | ||
| "column name": "Year", | ||
| "column name": "\"Year\"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is quoted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata generation tool have Python, Pydough and SQL keyword lists. If a table path or column name value matches that list or have special characters then the value is double-quote enclosed and any double-quote character in the name is escaped. With the quoting fix in Pydough this should not be an issue.
|
I agree with John, do we need that after the changes he made in this PR? |
This is actually true. With reserved words dataset and tests from s3 we no longer need synthea and WDI datasets on our repo anymore. We can talk about this. Meanwhile, the purpose of the PR was to update the "original" json files with the latest version used by LLM team, generated by the tool and enriched with help of the BIRD csv files. |
The intention of the PR is to use the latest version of the json used also by the LLM team. Also agree that we should not have |
|
@juankx-bodo I'm still confused why we need this PR at all since we shouldn't be using the JSON files in the repo, they get downloaded from S3. |
Update metadata with AI enriched version based on latest metadata generation tool version
Sample values and all definitions are based on the actual BIRD full dataset.